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Abstract 

Clustered regularly interspaced short palindromic repeat (CRISPR) RNA-guided adaptive immune 
systems that protect bacteria and archaea from infection by viruses are now being routinely repurposed 
for genome engineering in a wide variety of cell types and multicellular organisms. 



Introduction and content 

Clustered regularly interspaced short palindromic repeats 
(CRISPRs) and their associated genes (cas) are essential 
components of nucleic acid-based adaptive immune 
systems that are widespread in bacteria and archaea 
[1-5]. Similar to RNA interference (RNAi) pathways in 
eukaryotes, CRISPR-mediated immune systems rely on 
small RNAs for the sequence-specific delivery of dedi- 
cated nucleases to invading nucleic acids, such as 
viruses [2,5]. However, CRISPR systems are phylogeneti- 
cally and mechanistically distinct from RNAi. Here we 
provide a brief overview of CRISPR-mediated immunity 
and explain how the emerging new properties of this 
defense system are being repurposed for genome 
engineering in bacteria [6-9], yeast [10,11], human cells 
[10,12-28], insects [29-32], fish [33-37], worms [38-46], 
plants [47-54], frogs [55], pigs [56], and rodents [14, 
57-61]. The advent of these new genome engineering 
techniques illustrates how basic research can lead to 
unexpected innovations with applications in environ- 
mental and medical sciences. 

Discovering CRISPRs 

Each CRISPR locus consists of a series of short repeats 
that are separated by non-repetitive spacer sequences 
derived from foreign genetic elements (Figure 1). This 
conserved repeat-spacer-repeat architecture was origin- 
ally observed in the Escherichia coli genome in 1987 [62], 
but the function of these repeats remained enigmatic 
until 2005, when three groups reported that the spacer 



sequences in CRISPR loci are often identical to sequences 
in bacteriophage (phage) genomes and plasmids [63-65]. 
These observations suggested that CRISPRs might be part 
of a novel nucleic acid-based immune system designed to 
protect bacteria and archaea from infection by viruses and 
other genetic parasites. To test this hypothesis, Barrangou 
et al. challenged cultures of Streptococcus thermophilus with 
different phages and then screened for phage-resistant 
mutants [66]. DNA sequencing of the CRISPR loci from 
phage-resistant strains of S. thermophilus revealed that 
the CRISPR locus contained new "spacers" that were 
derived from the invading phage DNA, and the number 
of new phage-derived spacers correlated with the degree 
of phage resistance [66]. In addition, Barrangou and 
colleagues demonstrated that phage resistance could be 
genetically enhanced or reduced through insertion or 
deletion (respectively) of phage-targeting spacer sequences, 
suggesting that CRISPR-based vaccination programs 
might be used to protect industrial strains of bacteria 
from common phage infections. 

Stages of CRISPR-mediated defense 

Phylogenetic studies have identified distinct versions 
of the CRISPR system, but adaptive immunity in all of 
these systems proceeds in three distinct stages: acquisi- 
tion of foreign DNA, CRISPR RNA (crRNA) biogenesis, 
and target interference [3,4,67] (Figure 1A). During new 
sequence acquisition, short fragments of foreign DNA 
are non-randomly selected and preferentially integrated 
at one end of the CRISPR locus [68-73]. The addition 
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Figure I . Repurposing RNA-gulded nucleases from the CRISPR-medlated adaptive immune system in bacteria 
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A) CRISPR-mediated adaptive immunity proceeds in three distinct stages: acquisition of foreign DNA, CRISPR RNA (crRNA) biogenesis, and target 
interference. Bacteria acquire resistance to viral and plasmid challengers by integrating short fragments of foreign nucleic acid (called protospacers) into 
CRISPR loci encoded in the bacterial genome. Protospacers are selected from regions of the genome that are flanked by a short sequence motif called a 
protospacer adjacent motif (PAM). CRISPR loci consist of a series of short repeats (R, black diamonds) and unique spacers (red and blue lines). CRISPR 
loci are transcribed and the RNA is processed into a library of small CRISPR-derived RNAs (crRNAs). In some CRISPR systems (i.e. Type II), a trans-activating 
CRISPR RNA (tracrRNA) is essential for RNA processing and for recognition by Cas9 (CRISPR-associated protein 9). Cas9 is an RNA-guided, dsDNA 
binding protein that uses two nuclease domains to cleave both strands of target DNA. 

B) Cas9 targeting relies on PAM recognition and base pairing between the crRNA and the target DNA. The Cas9 nuclease can be easily programmed to 
target any DNA sequence with an adjacent PAM by designing a crRNA complementary to the target sequence. Genomic double-stranded DNA breaks are repaired 
by non-homologous end joining (NHEJ) or homology directed repair (HDR). NHEJ is error-prone, resulting in insertions or deletions (indels) that disrupt the 
target site. HDR relies on a donor template that can be used to deliver foreign DNA at a specific location. Alternatively, nuclease defective versions of Cas9 
(Cas9 D ) have been tethered to transcription activators (TA) that promote gene transcription, or transcriptional repressors (TR) that inhibit transcription. 
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of each new "spacer" sequence is accompanied by the 
duplication of the terminal repeat, thus maintaining 
the repeat-spacer-repeat architecture of the CRISPR locus 
[72]. CRISPR loci are transcribed into a long primary 
transcript that is processed into a library of small crRNAs, 
each containing a "guide" sequence complementary to 
a previously encountered invader [74-79]. Each crRNA 
is bound by one or more CRISPR-associated (Cas) 
protein(s) and the resulting ribonucleoprotein complex 
patrols the intracellular environment for targets that are 
complementary to the crRNA-guide sequence [80-88]. 
Identified target sequences are cleaved by dedicated 
nucleases [2,4,5,89]. Some CRISPR systems use crRNAs 



to target and cleave complementary RNAs, in a process 
that conceptually resembles eukaryotic RNAi [80,81,90]. 
However, most CRISPR systems use crRNAs to target 
invading DNA [91-95]. Many of these DNA targeting 
systems rely on sophisticated multi-subunit complexes, 
but one of these systems relies on a single protein 
called Cas9. Cas9 can be programmed with an RNA 
to target virtually any complementary DNA seq- 
uence [91,96]. The simplicity of the Cas9 system has 
recently been adopted by a rapidly expanding com- 
munity of scientists for programmable genetic engi- 
neering in microorganisms, cell lines, plants, and 
animals (Table 1). 



Table 1. Applications for RNA-guided Cas9 proteins 


Origin 


Gene 


Purpose 


Application 


Addgene Jf 


Reference 


J. ujrurCMCj 


Cas9 


DSB 


in vifr^i 

III V 1 LI u 


393 12 


T961 




Cas9 CSN 


SSN 


in vitro 


393 16 






Cas9 NSN 


SSN 


in vitro 


393 IS 






Cas9 D 


dead 


in vitro 


39318 




L innocua 


Cas9 


DSB 


in vitro 


39313 




S. thernnophilus 


Cas9 


DSB 


in vitro 


39314 




N. meningitidis 


Cas9 


DSB 


in vitro 


39317 




S. pyogenes 


hCas9 


editing 


hum/mus 


42234 


T221 

L J 


S. pyogenes 


hCas9 


editing 


hum/mus 


42230 


N61 

L 1 "J 




hCas9 NSN 


editing 


hum/mus 


42335 






hCas9 


editing 


hum/mus 


42229 






hCas9 NSN 


editing 


hum/mus 


42333 




S. bvogenes 


hCas9 


editing 


hum/mus 


41815 


[25] 




hCas9 NSN 


editing 


hum/mus 


41816 




S. pyogenes 


Cas9 


editing 


Bacteria 


44250 


[8] 




Cas9 D 


CRISPRi 


Bacteria 


44249 






hCas9 D 


CRISPRi 


hum/mus 


44246 






hCas9 D 


CRISPRi 


hum/mus 


44247 




S. pyogenes 


Cas9 


editing 


Bacteria 


42876 


[7] 


S. pyogenes 


Cas9 


editing 


Zebrafish 


42251 


[35] 




Cas9 


editing 


hum/mus 


42252 




S. pyogenes 


hCas9 


editing 


hum/mus 


43945 


[15] 


S. pyogenes 


hCas9 


editing 


hum/mus 


44719 


[18] 




hCas9 NSN 


editing 


hum/mus 


44720 




S. pyogenes 


hCas9 


editing 


Yeast 


43802 


[II] 




hCas9 


editing 


Yeast 


43804 




S. pyogenes 


hCas9 


editing 


hum/mus 


44758 


[59] 


S. pyogenes 


hCas9 D 


tracking 


hum/mus 


46910 


[10] 




hCas9 D 


repression 


hum/mus 


4691 1 






hCas9 D 


activation 


hum/mus 


46912 






hCas9 D 


activation 


hum/mus 


46913 






hCas9 D 


repression 


Yeast 


46920 






hCas9 D 


repression 


Yeast 


46921 




S. pyogenes 


hCas9 


editing 


Drosophila 


45945 


[30] 




hCas9 


editing 


Drosophila 


46294 




S. pyogenes 


wCas9 


editing 


C. elegans 


46168 


[41] 


S. pyogenes 


hCas9 


editing 


hum/mus 


43861 


Unpublished data (Joung Lab) 


S. pyogenes 


hCas9 


editing 


Plants 


46965 


[49] 


S. pyogenes 


zCas9 


editing 


Zebrafish 


46757 


[36] 




zCas9 


editing 


Zebrafish 


47929 




N. meningitidis 


Cas9 


editing 


hum/mus 


47867 


[13] 


S. pyogenes 


wCas9 


editing 


C. elegans 


47549 


[46] 


S. pyogenes 


wCas9 


editing 


C. elegans 


4791 1 


[39] 


S. pyogenes 


wCas9 


editing 


C. elegans 


47933 


[42] 



Cas9 CSN , (H840A mutant) complementary strand nickase; Cas9 D , catalytically inactive/dead; Cas9 NSN , (DI0A mutant) non-complementary strand 
nickase; CRISPRi, CRISPR-interference; DSB, double-strand DNA break; hCas9, mammalian codon optimized; hum/mus, human and mouse; SSN, 
single-strand nick; wCas9, worm codon optimized; zCas9, zebrafish codon optimized. 
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A CRISPR boom in biotechnology 

Basic research on bacteriophages led to the discovery of 
DNA restriction endonucleases in the 1970s [97,98]. 
These enzymes transformed molecular biology by 
making it possible to cleave specific DNA sequences. 
Like restriction enzymes, CRISPR systems evolved as 
components of prokaryotic immune systems that effi- 
ciently target nucleic acids for sequence-specific cleavage. 
However, unlike DNA restriction enzymes, which typi- 
cally bind to specific 4-8 bp regions of double-stranded 
DNA (dsDNA), CRISPR RNA-guided systems are extre- 
mely versatile and can be easily programmed to target 
virtually any RNA or DNA substrate. These new RNA- 
guided nucleases are now being exploited by genome 
engineers for programmed manipulation of nucleic 
acids in diverse model systems. 

In 2007, Barrangou et al. demonstrated that the CRISPR- 
mediated immune system in S. thermophilus relied on the 
cas9 gene for CRISPR-mediated protection from invading 
viruses [66]. To investigate the mechanism of protection 
and the fate of phage DNA during the infection, Garneau 
et al. sequenced viral DNA isolated from infected cells 
and showed that both stands of the target DNA were 
cleaved, resulting in a blunt-ended cleavage product 
[91]. However, at this time the mechanism of generating 
small CRISPR derived RNA's in this system was not 
understood. In 2011, Emmanuelle Charpentier's labora- 
tory reported the identification of a trans-activating 
crRNA (tracrRNA) with sequences complementary to the 
repeat sequences of the CRISPR RNA [99]. They show 
that processing of the long primary CRISPR transcript 
was dependent on the tracrRNA and an endogenous 
RNAase III enzyme. Subsequently, Jinek et al. purified the 
Cas9 protein from Streptococcus pyogenes and showed that 
Cas9-mediated cleavage of dsDNA relied on both the 
crRNA-guide and the tracrRNA (Figure 1A) [96]. To 
simplify these two RNA systems, Jinek et al. made a single 
chimeric RNA by fusing the 3' end of the crRNA to the 
5' end of the tracrRNA, and demonstrated that this RNA 
could target Cas9 to cleave virtually any DNA sequence 
by design. Similarly, Gasiunas et al. reported purification 
of the Cas9 protein from Streptococcus thermophilus and 
demonstrated programmable cleavage of dsDNA targets 
[100]. Together, mechanistic insights in these papers 
offered the exciting new possibility of using RNA-guided 
nucleases to generate dsDNA breaks for targeted genome 
"editing." 

The principles of genome editing rely on cellular DNA 
repair systems. The dsDNA breaks introduced by desi- 
gner nucleases are repaired by either non-homologous 
end-joining (NHEJ) [101] or homology-directed repair 
(HDR) [102] (Figure IB). NHEJ is an error-prone process 



that is often accompanied by insertion or deletion of 
nucleotides (indels) at the targeted site, resulting in a 
genetic knockout of the targeted region of the genome due 
to frameshift mutations or the insertion of a premature 
stop codon. Alternatively, HDR relies on template DNA 
containing sequences homologous to the targeted site to 
repair the double stranded break. Prior to the discovery of 
CRISPR RNA-guided nucleases, the most advanced meth- 
ods for genome editing involved sophisticated protein 
engineering of zinc finger nucleases (ZFNs), transcription 
activator-like effector nucleases (TALENs), or homing 
meganucleases [103,104]. However, protein engineering 
is expensive and the engineered enzymes sometimes 
cleave non-target sequences, resulting in off-target effects 
that are difficult to identify and sometimes toxic. In 
conttast to the previously existing technologies, CRISPR 
RNA-guided nucleases rely on simple Watson-Crick base 
pairing rules that abrogate the need for sophisticated 
protein engineering. The efficiency and accuracy of RNA- 
guided genome editing is cunendy the subject of intense 
investigation [7,14,17,20,21,24,26,61]. 

In 2013, less than six months after the reports by Jinek 
et al. and Gasiunas et al. on the programmable cleavage 
of dsDNA by Cas9, two Science papers by Cong et al. 
and Mali et al. demonstrated how RNA-guided Cas9 
nucleases could be used to edit genes in mouse or human 
cell lines [16,25]. To repurpose the Cas9 nuclease for 
targeted genome editing, the authors fused nuclear 
localization signals (NLSs) to a codon-optimized version 
of the cas9 gene and co-expressed this gene with plasmids 
expressing the tracrRNA and a crRNA-guide, or a single 
chimeric guide RNA (gRNA) [16,25]. Editing efficiencies 
by Cas9 were comparable to what has been achieved 
using ZFNs and TALENs, but using RNAs to program 
Cas9 for sequence-specific dsDNA breaks is simple, 
reliable, and cheap. In fact, Ding et al. recently compared 
the editing efficiency between TALENs and Cas9 at 
eight different loci in pluripotent stem cells and found 
that the Cas9-based system "consistendy and substan- 
tially outperformed" TALENs across all loci [18]. In 
addition to relying on NHEJ to introduce generic lesions 
at programmed cleavage sites, several papers have also 
now demonstrated that simultaneous delivery of either 
single-stranded or dsDNA donors can be used to 
promote HDR [9,11,13,14,18,24,25,28,30,34,38,42,44, 
46,47,50,56]. A DNA donor identical to the wildtype 
sequence can be used to restore the original sequence, 
but DNA donors can also be used to introduce single 
nucleotide mutations or new genes (Figure IB). Pro- 
grammed delivery of foreign DNA to specific locations 
in the genome suggests that CRISPR RNA-guided 
nucleases could be used for gene therapy to repair or 
replace defective genes. 
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These initial studies have been followed by a rapid 
succession of papers demonstrating the versatility of 
RNA-guided Cas9 nucleases for genome engineering. In 
the last eight months, there have been over 60 
independent publications demonstrating how different 
versions of the guide RNA can be used to target Cas9 to 
specific sequences for genome engineering in cells as well 
as multicellular organisms. Cas9-based systems have 
been used to efficiently generate allelic modifications in 
early stage embryos [14,29,30,33-37,55,57-61]. This 
method has been used to make biallelic transgenetic 
knockouts in animals using a single-step process that is 
profoundly accelerating in vivo genetic studies in live 
animal systems [14,33-37,55,57-61]. Furthermore, deli- 
very of multiple guide RNAs can be used to edit several 
genes in a single genome simultaneously or excise large 
genomic segments located between two different clea- 
vage sites [30,37,44,61]. This approach, called "multi- 
plexing", has been used to knock out up to five genes in 
a single embryo [36,57,58,60]. Multiplexing may be 
particularly useful for knocking out redundant genes or 
parallel pathways. 

The early founders of Cas9-based genome engineering 
established precedence for resource sharing by making 
their expression plasmids available to the scientific 
community at Addgene.org (Table 1). The accessibility 
of these plasmids, combined with the simplicity of 
programming Cas9, has contributed to the rapid imple- 
mentation of this system for target genome engineering. 
However, the versatility of this platform permits the 
development of novel applications that go beyond site- 
specific double-stranded breaks for traditional genome 
editing. Recently, nuclease defective mutants of Cas9 
(Cas9 D ) have been used as a programmable DNA-binding 
protein with the potential to deliver diverse cargos to 
specific locations. To date, Cas9 D has been used to regulate 
gene transcription in bacteria [6,8], yeast [10] and human 
cells [8,10,12,23,24,27] by fusing it to transcription factors 
and directing it to promoter regions of specific genes 
(Figure IB). Together, the gene repression and activation 
capabilities of Cas9 D -based systems provide a simple and 
efficient method for controlling global gene expression 
that will help untangle complex gene networks and 
facilitate the development of synthetic organisms with 
conttollable gene expression patterns. 

Defining the rules of engagement 

Understanding the molecular basis of RNA-guided DNA 
recognition by Cas9 is critical for implementing this 
system in a clinical setting. To understand the "rules of 
engagement", it is prudent to consider the context in 
which Cas9 evolved. In the first stage of adaptive 
immunity, foreign DNA (viral or plasmid) is inserted 



into the CRISPR locus of the host (Figure 1). Since the 
CRISPR locus is the template for generating crRNAs, each 
crRNA is complementary to at least two distinct targets: 
an invading phage or plasmid sequence (called a 
protospacer), and the "spacer" sequence in the CRISPR 
locus of the host. Cas9 avoids "self" (i.e. spacers in the 
CRISPR) and efficiently targets "non-self" (i.e. proto- 
spacers) through protein-mediated recognition of a short 
sequence motif called a protospacer adjacent motif 
(PAM). The PAM is an antigenic signature that may 
promote duplex destabilizations so that the crRNA can 
access the single-stranded regions of the adjacent DNA 
sequence for complementary base pairing [96,100]. 

To quantify how each of these recognition sequences 
contribute to target cleavage efficiencies by Cas9 from 
S. pyogenes, Jiang et al. generated a library of targets 
containing all possible nucleotide substitutions at each 
position in the protospacer and the PAM [7]. Their 
results clearly indicate that the NGG motif in the PAM 
region is the most potent antigen for Cas9 targeting, but 
a NAG or NNGGN PAM can also elicit Cas9 targeting. 
In addition to the PAM, they identified a 12-nucleotide 
"seed sequence" immediately upstream of the PAM that 
is critical for target recognition. However, the rules of 
engagement are complex and different mutations dis- 
play significantly different targeting defects [17]. 
Furthermore, there are many different variations of the 
Cas9 protein and many of these proteins have different 
PAM recognition sequences [13]. Generally speaking, 
the 12 bases proximal to the PAM are crucial for target 
recognition, but there are position- and nucleotide- 
specific effects that alter targeting efficiencies. To deter- 
mine how these rules apply in the context of human cells, 
several studies have recently evaluated Cas9-mediated 
off-target cleavage effects in cultured human cell lines 
[14,18,20,21,24,26]. These studies reveal that the length 
of the gRNA can alter efficiency of Cas9 targeting and 
higher concentrations of the gRNA and Cas9 result in 
higher frequencies of off-target cleavage effects. These 
observations are critical to consider during experimental 
design, and Hsu etal. have developed web-based software 
to help experimentalists select target sequences that 
will minimize potential off- target effects [21]. 

More than Cas9 

In the midst of the Cas9 frenzy, other important 
applications for the CRISPR machinery have been devel- 
oped. In 2008, Brouns et al. identified a protein in 
Escherichia coli that exclusively binds and selectively 
cleaves long CRISPR transcripts into small crRNAs [74]. 
This protein, called Cas6e (formerly CasE or Cse3) is a 
member of a large family of extremely diverse proteins 
that bind and cleave different RNA sequences. These 



Page 5 of 1 0 

(page number not for citation purposes) 



FlOOOPrime Reports 2014, 6:3 



http://f1000.eom/prime/reports/b/6/3 



proteins represent a new class of RNA restriction enzymes 
with the potential to advance RNA biology in the same 
way that DNA restriction enzymes did 40 years ago. 
Recently, activatable CRISPR-associated RNA restriction 
endoribonucleases have been used for targeted gene 
regulation in E. coli [105] and for affinity purification of 
RNAs and ribonucleoprotein complexes [106-108]. 

A CRISPR future 

The simplicity of programming RNA-guided Cas9 
nucleases has contributed to their rapid implementation 
by the genome engineering community. However, the 
extent of off-target cleavage and the influence of 
chromatin structure and modification states on Cas9 
cleavage efficiencies remain poorly understood. Early 
screens for off-target modifications focused on a subset 
of loci with sequences similar to the authentic target. 
These initial efforts failed to reveal off-target modifi- 
cations, but more recent studies performed using high- 
throughput techniques indicated that high concentrations 
of Cas9-based nucleases promote off-target cleavage 
[7,18,24,26,61]. Nevertheless, using higher concentrations 
of Cas9 also leads to more efficient gene disruption. This 
suggests a delicate balance between efficiency and 
accuracy of these nucleases, and methods that enhance 
their specificity may have significant utility. 

We anticipate that the pace of Cas9-based gene 
modifications will continue to accelerate and that this 
system will be implemented in increasingly diverse 
model systems. The simplicity of this system will permit 
a rapid generation of genome-scale knockout libraries 
for complex model systems, including human cells, with 
the potential for ex vivo gene therapy in humans. 
Furthermore, nuclease defective mutants of Cas9 will 
be tethered to an increasingly diverse array of accessory 
domains with functions that can be regulated with 
light (e.g. optogenetics) or chemical treatments. Collec- 
tively, Cas9-based technologies are revolutionizing 
contemporary molecular genetics. 

We are still in the infancy of a rapidly evolving field that 
has focused almost exclusively on the utility of Cas9 
proteins from a few organisms (Table 1). While the 
versatility of Cas9 proteins appears to be limitless, we 
anticipate that the biochemical, biophysical and target 
recognition properties of CRISPR RNA-guided complexes 
from other systems (i.e. Cascade, Cmr, and Csm) will 
have functional attributes that are desirable in certain 
contexts. Moreover, we anticipate that anti-CRISPR 
proteins encoded by some viruses will interact with the 
components of these systems in unanticipated ways that 
may lead to new applications for regulating or altering 
the function of these systems [67,109]. Basic research on 



the mechanisms of these systems will continue to be the 
fuel that drives innovation. 
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